Back

Communications Medicine

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Communications Medicine's content profile, based on 85 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
Human and viral whole genome sequencing identify HPV and APOBEC as oncogenic drivers in sinonasal squamous cell carcinoma

Chong, H. B.; Bryan, M. E.; Lin, M.; Faquin, W. C.; Mirabello, L. J.; Mishra, S. K.; Lewis, J. S.; Lawrence, M. S.; Faden, D. L.

2026-02-09 otolaryngology 10.64898/2026.02.04.26345593 medRxiv
Top 0.1%
8.8%
Show abstract

Sinonasal squamous cell carcinoma (SNSCC) is an aggressive head and neck cancer of the sinonasal cavity which has not benefitted from therapeutic advances over decades1. Though historically attributed to inhaled carcinogens such as hardwood dust and tobacco smoking2, SNSCC is incidentally associated with human papillomavirus (HPV)3,4. Importantly, HPV is the primary oncogenic driver of >80% of anatomically adjacent oropharyngeal cancers5. While viral status drives clinical staging and treatment guidelines in these malignancies6,7, the potentially oncogenic consequences and prognostic value of host-virus interactions in SNSCC remain incompletely defined. Here, through paired host and viral whole-genome sequencing (WGS), we map the genomic footprint of HPV in SNSCC. Strikingly, lesser studied strains such as HPV45, 51, and 39 constitute driver infections in this rare but clinically credentialed cancer, where extrachromosomal DNA (ecDNA)-associated viral integration and APOBEC mutagenesis are shown to underpin somatic tumor evolution. Statement of SignificancePaired host viral and whole-genome sequencing of SNSCC nominates HPV as a primary oncogenic driver of SNSCC. HPV-human ecDNA amplicons harboring noncanonical strains such as HPV45, 51 mediate viral carcinogenesis. Routine clinical diagnostic HPV panels should be expanded to capture the activity of lesser studied strains.

2
Comparative genomic analysis reveals shared and distinct mechanisms of nasal polyps and chronic rhinosinusitis

Yuan, S.; McVey, J. C.; Hartmann, K.; Abramowitz, S.; Woerner, J.; Shakt, G.; Judy, R.; Douglas, J. E.; Voight, B. F.; Kohanski, M. A.; Cohen, N. A.; Levin, M.; Damrauer, S. M.

2026-04-08 otolaryngology 10.64898/2026.04.07.26350325 medRxiv
Top 0.1%
6.7%
Show abstract

Background Chronic rhinosinusitis (CRS) and nasal polyps (NP) are closely related inflammatory airway diseases, and their co-occurrence is often associated with more persistent symptoms, frequent recurrence, and substantial respiratory morbidity. However, the extent to which CRS without and with NP (CRSsNP and CRSwNP) share genetic susceptibility-and which genetic mechanisms are disease-specific-remains poorly characterized. Methods We conducted cross-population genome-wide association meta-analyses of overall CRS (including both CRSwNP and CRSsNP) and NP (a proxy for CRSwNP) using data from six biobanks. We estimated genome-wide genetic correlations between overall CRS, CRSwNP, and a spectrum of respiratory diseases. We applied five complementary gene-prioritization strategies to nominate CRS- and CRSwNP-associated genes and performed pathway enrichment analyses to infer implicated biological processes. For CRSwNP, we integrated single-cell transcriptomic data to characterize cell-type-specific expression of prioritized genes and used stratified LD score regression to quantify heritability enrichment across immune and epithelial annotations. To delineate shared versus disease-specific genetic signals, we performed three comparative analyses-local genetic correlation, CRSwNP-CRS colocalization, and genomic structural equation modeling. Finally, we performed proteome-wide Mendelian randomization to identify circulating proteins with putative causal effects on CRS and CRSwNP. Results This GWAS meta-analysis identified 96 genome-wide significant loci for CRSwNP and 41 for overall CRS, prioritizing 92 and 39 candidate genes, respectively. CRSwNP and overall CRS showed shared genetic susceptibility (rg = 0.59; P = 6.8e-16), while CRS exhibited broader genetic correlations across multiple respiratory disorders. Pathway analyses consistently implicated immune signaling albeit with disease-specific emphases and lipid-metabolism networks. Single-cell analyses demonstrated distinct expression of CRSwNP-prioritized genes across nasal epithelial and immune cell clusters, and immune annotations explained more CRSwNP heritability (enrichment score = 4.1; P = 0.010) than epithelial annotations (2.5; P = 0.072). Comparative genetic analyses highlighted multiple shared loci-including BACH2, CD247, FADS2, FOXP1, FUT2, GPX4, IL7R, NDFIP1, RAB5B, RORA, SMAD3, TSLP - as well as 3 CRSwNP-specific and 6 CRS-specific loci. Proteome-wide MR identified 10 and 8 putatively causal circulating proteins for CRSwNP and overall CRS, respectively, with protein TNFSF11, IL2RB, and STX4 associated with both conditions. Conclusions This multi-population GWAS meta-analysis expanded genetic discovery for CRS and CRSwNP and showed substantial shared liability with distinct disease-specific components. Immune components explained a larger proportion of CRSwNP heritability than epithelial annotations, reinforcing the primacy of immune-driven mechanisms in polyp disease.

3
Evaluating the AI Potential as a Safety Net for Diagnosis: A Novel Benchmark of Large Language Models in Correcting Diagnostic Errors

Hassoon, A.; Peng, X.; Irimia, R.; Lianjie, A.; Leo, H.; Bandeira, A.; Woo, H. Y.; Dredze, M.; Abdulnour, R.-E.; McDonald, K. M.; Peterson, S.; Newman-Toker, D.

2026-02-24 health systems and quality improvement 10.64898/2026.02.22.26346832 medRxiv
Top 0.1%
6.2%
Show abstract

BackgroundDiagnostic errors are a leading cause of preventable patient harm, often occurring during early clinical encounters where diagnostic uncertainty is maximal. Large language models (LLMs) have shown potential in medical reasoning, yet their ability to function as a diagnostic safety net, specifically by identifying and correcting human diagnostic errors, remains systematically unquantified. We evaluated whether state-of-the-art LLMs can effectively challenge, rather than merely confirm, an erroneous physician diagnosis. MethodsWe evaluated 16 leading LLMs (including GPT-o1, Gemini 2.5 Pro, and Claude 3.7 Sonnet) using 200 standardized clinical vignettes representing 20 high-stakes, frequently misdiagnosed conditions. Models were presented with the full clinical record and an incorrect physician diagnosis. Primary outcomes included the diagnostic correction rate (disagreeing with the error and providing the correct diagnosis) and the ratio of correction to error detection. We further tested model robustness by generating 2,200 variants to assess the influence of demographic (race/ethnicity) and contextual (institutional reputation, training level, insurance) tokens. ResultsDiagnostic correction rates varied significantly across models. Gemini 2.5 Pro demonstrated the highest performance, correcting the physicians error in 55.0% of cases (n=110/200), followed by Claude Sonnet 3.5 (48.5%) and Sonnet 4 (47.0%). In contrast, DeepSeek V3 corrected only 20.0% of cases. Performance was strikingly consistent at the disease level; most models failed to correct errors in syphilis, spinal epidural abscess, and myocardial infarction. Furthermore, several models exhibited confirmation bias (agreeing with the incorrect diagnosis) occurring in 11.0% to 50.0% of cases. Stability across demographic and contextual variants was inconsistent, with some models showing spurious performance shifts based on non-clinical tokens. ConclusionWhile top-performing LLMs can intercept approximately half of the human diagnostic errors in high-stakes scenarios, performance is heterogeneous and highly sensitive to non-clinical context. Current models exhibit significant disease-specific gaps and a tendency toward confirmation bias, suggesting that their safe clinical integration requires adversarial, multi-agent workflows designed to prioritize skepticism over baseline agreement.

4
Iterative Extracellular Vesicle Protein Co-Expression Biomarker Refinement for Preoperative Classification of Histopathological Growth Patterns in Colorectal Liver Metastasis Patients

Martel, R.; Shen, M. L.; Tsamchoe, M. L.; Petrillo, S. K.; Lazaris, A.; Metrakos, P.; Juncker, D.

2026-02-04 bioengineering 10.64898/2026.02.02.702621 medRxiv
Top 0.1%
4.9%
Show abstract

Preoperative triage of colorectal liver metastases (CRCLM) by histopathological growth pattern (HGP)--angiogenesis-dependent desmoplastic (dHGP) and vessel co-opting replacement (rHGP)--could guide anti-angiogenic therapy, yet HGP scoring requires resected tissue. We present an extracellular vesicle (EV) inner and outer protein (EVPio) co-expression assay and iterative biomarker refinement for plasma-based HGP classification. We established a minimal, high-throughput plasma pre-processing workflow (low-speed centrifugation and 0.45 m filtration) with comparable EVPio assay performance to size-exclusion chromatography. We established an EV biomarker selection template with growing cohorts--feasibility (n = 3), pilot (n = 9), discovery (n = 67)--ranking candidate protein pairs by signal quality (SNR, CV), redundancy (inter-correlation/orthogonality), and HGP separation (effect size, significance, ROC). This process reduced an initial 19x18 capture/detection set to a focused 9x9 panel (81 co-expression pairs). In a 58-patient CRCLM subset (22 dHGP, 14 rHGP, 22 mixed), three pairs achieved high signal quality with significant differential expression across HGPs. A three-feature linear discriminant model yielded 75.9% cross-validated accuracy (AUC 0.77) for classifying pure dHGP vs. non-dHGP. The results show that co-expression signatures capture defining features of HGP biology while revealing heterogeneity. The proposed EV biomarker refinement template is generalizable and our results show that co-expression signatures capture defining features of HGP biology supporting efforts towards clinically actionable, HGP-driven therapeutic guidance.

5
Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning

Arethiya, N. J.; Krammer, L.; David, J.; Bakshi, V.; BasuChoudhary, A.; Bhuiyan, U.; Sen, S.; Mazumder, R.; McNeely, P.

2026-03-05 health informatics 10.64898/2026.03.04.26347604 medRxiv
Top 0.1%
4.9%
Show abstract

As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%-10% to type 2 diabetes. Total diabetes (diagnosed and undiagnosed) affects approximately 40.1 million Americans, or 12% of the population, with roughly 1.5 million new cases diagnosed annually. Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) misses. It enables, for instance, identification of nocturnal hypoglycemia or postprandial spikes, enhancing personalized, actionable treatment decisions and improving safety. The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) dataset was produced by the National Institutes of Health (NIH) Common Fund Data Ecosystem (CFDE) Bridge2AI program. This dataset offers a rich resource for diabetes research, providing comprehensive biosensor data from over 1,067 participants. However, like many medical datasets, AI-READI contains label inaccuracies due to self-reported health surveys and static HbA1c indicators, which can undermine model effectiveness. We developed a strong classification framework using Convolutional-Bidirectional Long Short-Term Memory (Conv+BiLSTM) to analyze and accurately classify glycemic health states from continuous glucose monitoring time-series data. Our aim was to establish and correct any misclassified labels through hybrid unsupervised-supervised learning methods and validated our results with expert-in-the-loop clinical review. We analyzed 784 participants from the AI-READI dataset, which represented four health states: healthy, prediabetes lifestyle controlled, oral medication, and insulin-dependent. Based on recommendations from the literature and our own expertise, we sought to compare the self-provided "healthy" group labels with a cluster-agnostic, CGM-defined healthy (CGM-H) reference derived from the CGM metrics using K-means clustering (K=6) on standardized CGM summary features to identify CGM-H participants and then applied XGBoost-based iterative label refinement. We identified a misclassification rate of 56.9% (161/283) in the initially labeled "healthy" group. After eight iterations of XGBoost refinement with dual-criterion relabeling ([≥]80% probability + unanimous out-of-fold voting), the cleaned dataset increased CGM-H participants from 122 to 195 for binary classification. Next, we developed a Conv+BiLSTM model combining Convolutional layers (32, 64 filters) for local temporal feature extraction with Bidirectional LSTM layers (64, 32 units) for sequence modeling, using time-series engineered features including rolling statistics, glucose derivatives, and circadian rhythm encoding. Class imbalance was addressed with per-class weighting, and 5-fold stratified cross-validation estimated generalization performance, computing a global decision threshold (0.374) by maximizing Youdens J statistic on concatenated out-of-fold predictions. Additionally, we analyzed heart rate, activity level, and stress and sleep data and validated it against CGM data. The Conv+BiLSTM model achieved ROC-AUC {approx} 0.932 on the held-out test set and 0.907 {+/-} 0.026 in cross-validation, with well-calibrated predictions (Expected Calibration Error = 0.075, temperature scaling T = 1.00). A 3-tier confidence-based decision system achieved 82% detection rate with only 6% OGTT burden, enabling actionable clinical recommendations. This hybrid approach addressed label noise while achieving high discrimination. This framework demonstrates potential for real-time glycemic state monitoring and early intervention in diabetes progression.

6
Regression vs. Medical LLMs: A Comprehensive Study for CVD and Mortality Risk Prediction

KOM SANDE, S. D.; Skorski, M.; Theobald, M.; Schneider, J.; Marz, W.

2026-03-11 health informatics 10.64898/2026.03.11.26347789 medRxiv
Top 0.1%
4.8%
Show abstract

Cardiovascular diseases (CVDs) remain the foremost cause of global morbidity and mortality, driving an urgent need for robust predictive tools that enable early detection and preventive intervention. Traditional regression-based models--such as linear and logistic regression, regression trees and forests, and Support Vector Machines (SVMs)--have long underpinned CVD risk estimation but often assume linear relationships, homogeneous effects across populations, and a limited number of predictors. Recent advances in regression, such as bagging and boosting, as well as Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are increasingly shifting this paradigm. In this paper, we review key developments in the context of both classic regression techniques and recent GenAI approaches, and we put a particular focus on openly available Medical LLMs (MedLLMs) in combination with few-shot prompting and classification finetuning. Based on the LURIC cardiovascular health study, we investigate a broad variety of biomarkers and risk factors under two different cohorts of 3,316 CVD risk patients who underwent coronary angiography in Germany between 1997 and 2000. Our results demonstrate that large, pretrained MedLLMs (70B) achieve up to 82% AUROC for 1-year all-cause mortality (1YM) prediction with optimized few-shot prompting, thus performing competitively with recent regression techniques and state-of-the-art methods from the medical literature such as CoroPredict, SMART and SCORE2. Smaller models (8B) can be finetuned to match or even surpass their larger counterparts as well as commercial models like ClaudeSonnet-4.5 and ChatGPT-5.2. Among all evaluated approaches, the best-performing boosting-based regression technique (CatBoost) and commercial LLM (Gemini-3-Flash) both achieve an AUROC of up to 85%. Further model-calibration and -stratification analyses reveal a systematic mortality over-prediction (ECE: 0.05-0.10) of MedLLMs, while Platt scaling effectively reduces such miscalibrations by 60-90%.

7
Comparison of the prevalence of all diagnosed diseases among Estonian Biobank participants against the general population

Pajusalu, M.; Oja, M.; Mooses, K.; Heinsar, S.; Aasmets, O.; Laisk, T.; Palta, P.; Org, E.; Magi, R.; Vosa, U.; Fischer, K.; Estonian Biobank Research Team, ; Tillmann, T.; Laur, S.; Reisberg, S.; Vilo, J.; Kolde, R.

2026-02-06 health informatics 10.64898/2026.02.05.26345634 medRxiv
Top 0.1%
4.8%
Show abstract

Characterizing study sample representativeness is critical for the validity of biobank-derived findings, yet selection biases are rarely quantified across the full clinical spectrum. Here we systematically evaluate the Estonian Biobank (EstBB) - comprising [~]20% of the adult population - by comparing two recruitment waves against a 30% national reference dataset (Est-Health-30). Analyzing prevalence ratios (PR) across 1,028 ICD-10 categories, we reveal a bifurcated landscape of representativeness. While EstBB achieves population parity (PR [~] 1.0) for widespread chronic conditions like Type 2 diabetes, 47% of diagnostic categories exhibit significant deviations (>1.3-fold). We identify a distinct "managed symptomatic" phenotype: a systematic enrichment of outpatient diagnoses - such as melanocytic nevi (PR=2.07, 95% CI 1.93...2.21) and major depression (PR=1.53, 1.4...1.66) - coupled with a depletion of high-mortality conditions like lung cancer (PR=0.69, 0.64...0.75) and vascular dementia (PR=0.45, 0.38...0.54). These biases evolved across recruitment phases, with the later EstBB2 cohort exhibiting a healthier, prevention-oriented profile. To support research integrity, we provide an interactive open-access dashboard for phenotype refinement. Accounting for such selection-driven "clinical visibility" is essential to avoid collider bias in risk prediction and causal inference models.

8
Self-Reported Side Effects of Semaglutide and Tirzepatide in Online Communities

Sehgal, N. K. R.; Tronieri, J. S.; Ungar, L.; Guntuku, S. C.

2026-03-13 health informatics 10.64898/2026.03.12.26348253 medRxiv
Top 0.1%
4.8%
Show abstract

Social media can reveal patient experiences with glucagon-like peptide-1 receptor agonists (GLP-1 RAs) that extend beyond clinical trial data. We analyzed 410,198 Reddit posts (May 2019-June 2025) mentioning semaglutide or tirzepatide. A total of 67,008 users self-reported using these medications, and 43.5% described at least one side effect. Gastrointestinal symptoms predominated, including nausea (36.9%), fatigue (16.7%), vomiting (16.3%), constipation (15.3%), and diarrhea (12.6%). Notably, reproductive symptoms (e.g., menstrual irregularities) and temperature-related complaints (e.g., chills, hot flashes) emerged as unrecognized potential effects. These findings highlight patient concerns not well captured in current labeling or trials. Large-scale social media analysis can complement traditional pharmacovigilance by detecting emerging safety signals and expanding understanding of the real-world safety profile of GLP-1 RAs.

9
Predictive Modeling of COVID-19 Variant Peak Prevalence and Duration Using GISAID Data Across 15 Countries

Zhang, Y.; Rob, P.; Chen, K.; Overton, C. E.; Jung, J.; Jo, Y.

2026-02-05 infectious diseases 10.64898/2026.02.04.26345559 medRxiv
Top 0.1%
4.8%
Show abstract

BackgroundRapid emergence and replacement of SARS-CoV-2 variants underscore the need for early and reliable indicators of variant dominance to guide timely public health response. However, early genomic trajectories are typically short, sparse, and noisy, with strong fluctuations and substantial cross-country heterogeneity in sequencing intensity and reporting. MethodsWe develop a scalable forecasting framework that predicts whether new variants will reach high prevalence and how long they will persist based on their initial genomic growth patterns. Using more than nine million sequences from 15 countries (GISAID, 2020-2024), we characterize dominance through peak prevalence and duration above 10% and extract early growth descriptors from the first 2-4 weeks after a lineage surpasses 1% frequency. Outcomes were classified using multiple models (GLM, GAM, SVM, CART, Elastic Net, and SuperLearner). We evaluated performance based on accuracy and utilized SHAP analysis to interpret feature importance. ResultsThe Super Learner ensemble model achieved the best performance, achieving up to 0.76 accuracy for peak-share prediction, and up to 0.70 accuracy for duration classification--substantially outperforming all individual models. SHAP analysis showed that variants achieving high peaks exhibit strong but structurally coherent early growth, whereas prolonged dominance is associated not with early surges but with sustained, moderate short-term fluctuations embedded within a stable trajectory. ConclusionThis framework defines minimum surveillance thresholds ([≥]100 sequences in 30 days, [≥]1% detection share), variant grouping rules, and noise-filtering protocols, enabling cross-country comparison and country-specific forecasting. It provides a lightweight, reproducible early-warning tool for genomic surveillance and real-time epidemic intelligence. SignificanceIdentifying emerging SARS-CoV-2 variants capable of driving new surges is critical for global preparedness but remains challenging due to sparse early data. We present a machine learning framework that forecasts variant dominance using only the first 2-4 weeks of genomic growth. Analyzing nine million sequences across 15 countries, we reveal two distinct epidemiological signatures: high peak prevalence is driven by explosive, coherent early expansion, while long-term persistence is predicted by sustained, moderate fluctuations rather than initial speed. By establishing minimum surveillance thresholds, this work delivers a scalable, data-efficient early-warning tool that links early genomic signatures of viral fitness to downstream population-level dominance, achieving high predictive accuracy with a minimal number of sequences.

10
Where risk becomes visible: a layered fixed-policy framework for diabetic kidney disease screening in type 2 diabetes

Khattab, A.; Wang, Z.; Srinivasasainagendra, V.; Tiwari, H. K.; Loos, R.; Limdi, N.; Irvin, M. R.

2026-04-22 nephrology 10.64898/2026.04.21.26351384 medRxiv
Top 0.1%
4.4%
Show abstract

BackgroundDiabetic kidney disease (DKD) is a leading cause of kidney failure in individuals with type 2 diabetes (T2D), yet risk identification in routine clinical practice remains incomplete. A critical and often overlooked barrier is risk observability: how much of a patients underlying risk is actually captured in their clinical record at the time of screening. Existing prediction models evaluate performance using model-specific thresholds, making it difficult to understand how additional data sources alter real-world screening behavior or which individuals benefit when models are expanded. MethodsWe developed a series of five nested machine learning models evaluated at a one-year landmark following T2D diagnosis using data from the All of Us Research Program (N = 39,431; cases = 16,193). Each successive model added a distinct information layer -- intrinsic risk, laboratory snapshots, medication exposure, longitudinal care trajectories, and social determinants of health (SDOH) -- while retaining all prior features. All models were evaluated under a fixed screening policy targeting 90% specificity, so that the false positive rate remained constant as the information available to the model grew. External validation was conducted in the BioMe Biobank (N = 9,818) without retraining. ResultsDiscrimination improved consistently across layers, from AUROC 0.673 (M1) to 0.797 (M5). Under the fixed screening policy, sensitivity nearly doubled from 0.27 to 0.49, with a cumulative recovery of 30.4% of cases missed by the base model. Gains were driven by distinct subgroups at each transition: laboratory features identified biologically high-risk individuals; medication features captured those with high treatment intensity reflecting advanced cardiometabolic burden; longitudinal care trajectory features rescued cases with biological instability observable only through repeated measurements; and SDOH features recovered individuals with limited clinical observability, with rescue probability highest among those with the fewest recorded monitoring domains. Sparse data in the clinical record indicated low observability, not low risk. Social and genetic features each contributed most when downstream physiologic signal was limited, supporting a contextual rather than universal role for each. In BioMe, discrimination was attenuated (M4 AUROC 0.659), but the relative ordering of information layers was fully preserved, and a systematic upward shift in predicted probability distributions underscored the need for recalibration before deployment in a new setting. ConclusionsDKD risk detection in T2D is substantially improved by integrating complementary information layers under a fixed clinical screening policy, with gains arising from distinct domains that identify at-risk individuals in different clinical contexts. The layered landmark framework introduced here reveals how risk observability -- shaped by monitoring intensity, healthcare engagement, and access -- determines what a screening model can detect, and provides a foundation for context-aware EHR-based screening that accounts for data availability at the time of risk assessment. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=140 SRC="FIGDIR/small/26351384v1_ufig1.gif" ALT="Figure 1"> View larger version (51K): org.highwire.dtl.DTLVardef@1cc7f4borg.highwire.dtl.DTLVardef@b92956org.highwire.dtl.DTLVardef@48ffbcorg.highwire.dtl.DTLVardef@8dc627_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstract.C_FLOATNO Study design and layered DKD screening framework The top row defines the cohort timeline, in which predictors are derived from clinical data collected between T2D diagnosis and the 1-year landmark, and incident DKD is ascertained after the landmark. The second row depicts the nested model architecture, in which five successive models sequentially incorporate intrinsic risk, laboratory snapshot features, medication exposure, longitudinal care trajectories, and social determinants of health, while retaining all features from prior layers. The third row summarizes model development in the All of Us Research Program (N = 39,431) and external validation in the BioMe Biobank (N = 9,818), where the same trained models and risk thresholds were applied without retraining. The bottom row highlights the three evaluation domains: predictive performance, fixed-policy screening, and missed-case recovery context. DKD, diabetic kidney disease; T2D, type 2 diabetes; PRS, polygenic risk scores; AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve; PPV, positive predictive value; SHAP, SHapley Additive exPlanations. C_FIG

11
Insights from the second season of collaborative influenza forecasting in Italy with updated targets incorporating virological information

Fiandrino, S.; Bertola, T.; D'Andrea, V.; De Domenico, M.; Viola, E.; Zino, L.; Mazzoli, M.; Rizzo, A.; Li, Y.; Perra, N.; Sartore, M.; Masoumi, R.; Poletto, C.; Mateo Urdiales, A.; Bella, A.; Gioannini, C.; Milano, P.; Paolotti, D.; Quaggiotto, M.; Rossi, L.; Vismara, I.; Vespignani, A.; Gozzi, N.

2026-03-04 epidemiology 10.64898/2026.03.04.26347601 medRxiv
Top 0.1%
4.2%
Show abstract

We present results from the second season of Influcast, a multi-model collaborative forecasting hub focused on influenza in Italy. During the 2024/25 winter season, Influcast collected one- to four-week-ahead probabilistic forecasts of influenza-like illness (ILI) incidence alongside influenza A and B ILI+ incidence signals. New ILI+ targets were constructed integrating syndromic surveillance data with virological detections collected weekly by the Italian National Institute of Health. Forecasts were submitted by six independent models (including compartmental, metapopulation, and statistical approaches) and combined into an ensemble. Ensemble forecasts for ILI+ consistently outperformed both the baseline (a naive persistence model) and most individual models in terms of Weighted Interval Score (WIS), Absolute Error (AE), and prediction coverage. Importantly, ensemble ILI+ forecasts achieved significantly lower WIS and AE ratios (i.e., ratio between the ensemble and the baseline models) and improved calibration compared to ILI forecasts. Our findings support the integration of virological surveillance data in forecasting target definition to improve the reliability of epidemic forecasts and strengthen their utility for situational awareness, communication, and targeted intervention.

12
Heterogeneous but Segmentable: A Data-Driven Approach to Modelling Long-Term Care Trajectories in Multiple Sclerosis

Vesinurm, M.; Makitie, L.; Lillrank, P.; Saarinen, L.; Torkki, P.; Laakso, S. M.; Koskinen, M.

2026-01-30 health systems and quality improvement 10.64898/2026.01.28.26345045 medRxiv
Top 0.1%
4.2%
Show abstract

Managing chronic diseases with unpredictable care demand creates significant operational challenges for healthcare systems. Mapping long-term care trajectories is crucial for improving resource allocation, anticipating service needs, and designing efficient care pathways. We used a data-driven approach to map six-year care trajectories for 962 newly diagnosed multiple sclerosis patients, identify utilization clusters, and determine predictors of high utilization. We analyzed Event logs of remote, outpatient, emergency, and inpatient contacts from one year pre- to five years post-diagnosis using K-means clustering to identify utilization clusters, logistic regression to identify predictors of high utilization, and process mining to model variation between care trajectories. We identified two distinctive utilization clusters: a high-utilization cluster (14.1 % of patients) with persistently elevated annual encounter volumes across all care settings and low-utilization cluster (85.9 % of patients) with lower and declining use. Median service costs were {euro}18,736 vs. {euro}6,052 in high- and low-utilization clusters, respectively. Two or more early relapses were the strongest predictor of high utilization (OR = 6.33, 95 % CI 3.49-11.50, p < 0.001), with number of planned early remote and outpatient care contacts being also associated with future service utilization (OR = 1.07, 95 % CI 1.04-1.10, p < 0.001). High-utilization trajectories were approximately three times longer (82.4 vs 25.9 events) and more variable (3.1 vs 2.4 unique events per patient). These utilization clusters and their distinct trajectories provide a pragmatic segmentation of multiple sclerosis patients to support early identification of high-utilization subgroups and more robust capacity planning in specialist care. HighlightsO_LIWe tracked the care trajectories of 962 people with relapsing-remitting multiple sclerosis using a Finnish population-based specialist-care datalake covering both inpatient and outpatient neurology services. C_LIO_LIPatients fell into two distinct utilization clusters: a high-utilization cluster with frequent contacts across all care settings and a low-utilization cluster with lower and declining use. C_LIO_LITwo or more early relapses, and the number of early outpatient and remote contacts were strong predictors of a patients long-term affiliation in the high-utilization cluster. C_LIO_LISegmented care trajectories showed that high-utilization patients followed longer, more varied, and acute-oriented care patterns and had much higher service encounter costs. C_LIO_LIThese findings can help clinicians and managers identify potential high-utilization patients early, target resources more effectively, and plan for future healthcare demand. C_LI

13
A graph-based deep learning framework for diabetic retinopathy classification with topology-aware feature augmentation

Belhadj, N. B.; Mezghich, M. A.; Fattahi, J.; Ghayoula, R.; Latrach, L.

2026-03-23 bioengineering 10.64898/2026.03.19.713075 medRxiv
Top 0.1%
3.9%
Show abstract

Diabetic retinopathy (DR) is the leading cause of preventable blindness in working-age adults, affecting an estimated 103 million people worldwide. Standard deep learning classifiers treat fundus images as independent samples, ignoring latent inter-patient relational structure that is most informative at clinically ambiguous intermediate severity levels. We propose a topology-aware, graph-based deep learning framework combining three complementary components: (i) an EfficientNet-B3 convolutional backbone for high-level visual feature extraction; (ii) persistent homology descriptors (H0 and H1) derived from morphologically skeletonised retinal vascular networks, characterising global vascular topology in a noise-robust manner; and (iii) a GraphSAGE graph neural network propagating disease-related information across a population-level similarity graph, refining representations through inductive neighbourhood aggregation. The similarity graph combines cosine similarity on visual features with 2-Wasserstein distance between persistence diagrams. Evaluated on three public benchmarks, the framework achieves 95.5% accuracy on Kaggle DR, 96.1% on Messidor-2, and 94.6% on APTOS 2019, consistently outperforming a strong CNN baseline by 1.5-2.3 percentage points across accuracy, Quadratic Weighted Kappa, and macro-F1. Ablation experiments confirm synergistic contributions of topological feature augmentation and relational graph learning. One-way ANOVA (F > 80, p < 0.001) confirms that DR progression is reflected in global vascular topology across all five severity stages, providing quantitative biological grounding for the framework design. Code and data are publicly available at https://github.com/Nader-BelHadj/plosene.

14
CD276 in Meningioma Transcriptomic Classification: Internal Development, External Validation, and Stability-Informed Interpretation

Lee, H.; Kim, H.

2026-04-05 health informatics 10.64898/2026.04.03.26350116 medRxiv
Top 0.1%
3.7%
Show abstract

Background: CD276 has been proposed as a candidate gene associated with the biological characteristics of meningioma, but its predictive position and interpretive significance within a transcriptomic classifier have not yet been clearly established. Accordingly, this study aimed to evaluate CD276 stepwise across internal model development, external validation, calibration, decision-analytic assessment, feature stability, and robustness analyses using public transcriptomic cohorts. Methods: The analyses in this study were organized into two interconnected notebooks. In Notebook A, we reconstructed the internal training cohort (GSE183653), evaluated the CD276 single-gene signal, and then developed a transcriptome-wide multigene classifier. We also performed permutation importance, bootstrap confidence interval, label permutation test, repeated cross-validation, CD276 ablation, and internal calibration analyses. In Notebook B, we reproduced the external validation cohort (GSE136661) in a fixed common-gene space, applied train-only recalibration and train-only threshold transfer, and extended the interpretation through decision curve analysis, stability analysis, enrichment analysis, and one-factor-at-a-time robustness analysis. Results: The internal training cohort consisted of 185 samples and 58,830 genes, of which 25 were WHO grade III cases. CD276 expression showed a significant association with WHO grade, but the internal discrimination of the CD276-only baseline was limited (ROC-AUC 0.628, average precision 0.323, balanced accuracy 0.540). In contrast, the initial transcriptome-wide model showed ROC-AUC 0.834 and PR-AUC 0.509, and under 5-fold cross-validation, the canonical fulltranscriptome model and the CD276-forced 5,001-feature branch showed mean ROC-AUC/PR-AUC of 0.854/0.564 and 0.855/0.606, respectively, outperforming the CD276-only baseline at 0.644/0.391. CD276 was not included in the initial 5,000-feature filtered set and ranked 900th among 5,001 features even in the forcibly included 5,001-feature branch. In paired ablation analysis, the performance difference attributable to inclusion of CD276 was effectively close to zero (delta ROCAUC 0.000062, delta PR-AUC 0.000056). Internal calibration analysis showed an overconfident probability pattern (Brier score 0.10501, intercept -1.421392, slope 0.413241). In external validation, the fixed multigene pipeline achieved ROC-AUC 0.928 and PR-AUC 0.335. Train-only recalibration improved calibration metrics while preserving discrimination, and decision curve analysis showed threshold-dependent but limited external utility. Stability analysis showed overlap between core-stable genes and high-impact genes, but CD276 was not supported as a dominant stable core feature and remained in the target-of-interest tier. In robustness analysis, some perturbations preserved the primary interpretation, whereas others revealed transform sensitivity or an alternative high-performing feature-space solution. Conclusions: CD276 is a gene of interest associated with meningioma grade, but it was difficult to interpret it as a strong standalone predictor or a dominant stable classifier feature. In this study, the main basis of predictive performance lay not in CD276 alone but in a broader multigene transcriptomic structure, and probability output needed to be interpreted conservatively with calibration taken into account. These findings position CD276 not as a direct single-gene classifier but as a biologymotivated target-of-interest that should be interpreted within a broader transcriptomic program.

15
Lesion-Centric Latent Phenotypes from Segmentation Encoders for Breast Ultrasound Interpretability

Mittal, P.; Singh, D.; Chauhan, J.

2026-03-06 radiology and imaging 10.64898/2026.03.06.26347800 medRxiv
Top 0.1%
3.6%
Show abstract

We propose a lesion-centric phenotype learning pipeline for interpretable breast ultrasound (BUS). Predicted lesion masks are used for mask-weighted pooling of segmentation-encoder latents, producing compact embeddings that suppress background influence; a lightweight calibration step improves cross-dataset consistency. We cluster embeddings to discover latent phenotypes and relate phenotype structure to morphology descriptors (compactness, boundary sharpness). On BUSI and BUS-UCLM with external testing on BUS-BRA, lesion-centric pooling and calibration improve separability and enable strong malignancy probing (AUC 0.982), outperforming radiomics and a standard CNN baseline. A simple rule-gated generator further improves BI-RADS-style descriptor consistency on difficult cases.

16
Synergistic barriers to algorithmic recourse in healthcare and administrative systems

Demdiont, A. C.

2026-02-26 health systems and quality improvement 10.64898/2026.02.22.26346836 medRxiv
Top 0.1%
3.6%
Show abstract

Algorithmic decision systems mediate access to healthcare, credit, employment and housing, yet individuals who experience adverse decisions face multi-stage barriers when seeking recourse. We formalize these barriers as a series-structured system with 11 empirically parameterized stages across three layers (data integration, data accuracy and institutional access) and prove that single-barrier interventions are bounded by baseline system success. Under baseline parameterization derived from federal datasets and peer-reviewed algorithmic audit studies, end-to-end recourse probability is 0.0018%. Removing any single barrier yields negligible improvement (<0.02%). Factorial decomposition reveals that the three-way cross-layer interaction accounts for 87.6% of achievable improvement, confirmed by Shapley attribution, Sobol sensitivity analysis and bootstrap resampling (n = 1,000). These results provide a structural explanation for the limited impact of incremental reforms and support coordinated multi-layer intervention approaches for clinical AI governance and algorithmic fairness.

17
SPLIT: Safety Prioritization for Long COVID Drug Repurposing via a Causal Integrated Targeting Framework

Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.

2026-04-16 health informatics 10.64898/2026.04.12.26350701 medRxiv
Top 0.1%
3.5%
Show abstract

Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.

18
Proteomic Discovery of Urinary Myoglobin as a Noninvasive Biomarker for PROCHOB caused by CUBN Variants

Inoki, Y.; Horinouchi, T.; Sakakibara, N.; Ishiko, S.; Yamamoto, A.; Aoyama, S.; Kimura, Y.; Ichikawa, Y.; Tanaka, Y.; Kondo, A.; Yamamura, T.; Ishimori, S.; Araki, Y.; Asano, T.; Fujimura, J.; Fujinaga, S.; Hamada, R.; Inoue, N.; Kaito, H.; Kiyota, K.; Kobayashi, A.; Kobayashi, Y.; Kumagai, N.; Miyano, H.; Ohtomo, Y.; Sasaki, S.; Suzuki, R.; Washio, M.; Yamada, Y.; Yamasaki, Y.; Yokoyama, T.; Iijima, K.; Nagano, C.; Nozu, K.

2026-04-01 nephrology 10.64898/2026.03.26.26349155 medRxiv
Top 0.1%
3.5%
Show abstract

Chronic benign proteinuria (PROCHOB), caused by biallelic pathogenic variants in CUBN, presents in childhood as isolated, asymptomatic tubular proteinuria with preserved long-term kidney function. Because its clinical presentation closely mimics early stage glomerular diseases with moderate proteinuria and without increased urinary {beta}2-microglobulin (uBMG) and 1-microglobulin, numerous patients undergo unnecessary kidney biopsies and receive angiotensin-converting enzyme inhibitors or angiotensin II receptor blockers before genetic testing is considered. Using high-throughput aptamer-based urinary proteomics (SomaScan(R)), we identified urinary myoglobin as a disease-specific biomarker for PROCHOB. We developed and confirmed a diagnostic approach in which the urinary myoglobin-to-creatinine (uMB/Cr) ratio robustly distinguishes PROCHOB from other moderate glomerular proteinuric kidney diseases. Although certain cases of Dent disease causing megalin dysfunction exhibit increased urinary myoglobin levels, PROCHOB and Dent disease can be clearly distinguished based on the uBMG-to creatinine ratio. This biomarker reflects impaired proximal tubular protein reabsorption because of cubilin dysfunction and remains normal in healthy individuals or those with typical glomerular diseases with moderate proteinuria. Our findings establish a noninvasive diagnostic tool for PROCHOB that prompts targeted genetic testing for CUBN variants using the uMB/Cr and urinary uBMG-to-creatinine ratios. This strategy has the potential to transform the clinical diagnostic pathway for isolated proteinuria.

19
Predicting COVID-19 incidence from seroprevalence and population-based cohort data using interpretable machine learning with differential privacy analysis

Krepel, J.; Binkyte, R.; Kerkouche, R.; Harries, M.; Klett-Tammen, C. J.; Fritz, M.; Kesselheim, S.; Kuehn, M.; Bazarova, A.; Lange, B.

2026-04-02 epidemiology 10.64898/2026.04.01.26349876 medRxiv
Top 0.1%
3.5%
Show abstract

During the COVID-19 pandemic, reported incidence data played a central role in public health surveillance and in tracking epidemic dynamics, although they provide limited insight into the behavioral, immunological, and socioeconomic drivers of transmission.Population-based seroprevalence studies with linked survey data offer a rich but untapped source of individual-level information that can complement routine surveillance. In this study, we investigate whether aggregated seroprevalence cohort data can be leveraged to predict local COVID-19 incidence and to identify interpretable predictors associated with transmission dynamics. Using data from the Multilocal SeroPrevalence (MuSPAD) study in Germany (2020--2022), we trained multiple machine learning models, including least absolute shrinkage and selection operator (LASSO), vector autoregressive models (VAR), multilayer perceptrons (MLPs), and long short-term memory neural networks (LSTMs), to predict location-specific seven-day incidence rates. Feature importance was assessed using regression coefficients where applicable and model-agnostic explainability methods, including Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). Across model classes, cohort-derived features enabled accurate prediction of local incidence, with time-aware models achieving the strongest performance. Consistent predictors included prior infection and testing history, employment-related changes, vaccination status, and mask-wearing behavior, highlighting the importance of behavioral and reporting-related signals. While differential privacy introduced modest degradation in predictive performance under strict privacy budgets, SHAP-based explanations remained stable, and LIME-based explanations were more sensitive to privacy-induced noise. These results demonstrate that aggregated cohort data encode meaningful and interpretable signals of population-level transmission dynamics. Population-based serosurveys therefore provide a complementary source of information for predicting local COVID-19 incidence and identifying key drivers of transmission beyond routine surveillance data. Our findings show that integrating interpretable machine learning with privacy-aware analysis enables actionable insights from sensitive cohort data, supporting their use in digital epidemiology and informing data-driven public health decision-making.

20
Impact of proteogenomic evidence on clinical success

Karim, M. A.; Hukku, A.; Ariano, B.; Holzinger, E.; Tsepilov, Y.; Hayhurst, J.; Buniello, A.; McDonagh, E. M.; Castel, S. E.; Nelson, M. R.; Maranville, J.; Yerges-Armstrong, L.; Ghoussaini, M.

2026-02-25 genetic and genomic medicine 10.64898/2026.02.23.26346731 medRxiv
Top 0.1%
3.5%
Show abstract

We assessed the impact of plasma protein quantitative trait loci (pQTL) on therapeutic hypotheses backed by human genetic evidence. We show that pQTL-supported target-indication pairs were 4.7 times more likely to advance from Phase I to launch, compared to a 2.6-fold increase observed only with human genetic evidence. Moreover, pQTL-based enrichment was prominent in druggable protein families which had limited enrichment from human genetic evidence alone.